Web archives capture the history of the Web and are therefore an importantsource to study how societal developments have been reflected on the Web.However, the large size of Web archives and their temporal nature pose manychallenges to researchers interested in working with these collections. In thiswork, we describe the challenges of working with Web archives and propose theresearch methodology of extracting and studying sub-collections of the archivefocused on specific topics and events. We discuss the opportunities andchallenges of this approach and suggest a framework for creatingsub-collections.
展开▼